41 research outputs found

    Segmentation-Based Bounding Box Generation for Omnidirectional Pedestrian Detection

    Full text link
    We propose a segmentation-based bounding box generation method for omnidirectional pedestrian detection that enables detectors to tightly fit bounding boxes to pedestrians without omnidirectional images for training. Due to the wide angle of view, omnidirectional cameras are more cost-effective than standard cameras and hence suitable for large-scale monitoring. The problem of using omnidirectional cameras for pedestrian detection is that the performance of standard pedestrian detectors is likely to be substantially degraded because pedestrians' appearance in omnidirectional images may be rotated to any angle. Existing methods mitigate this issue by transforming images during inference. However, the transformation substantially degrades the detection accuracy and speed. A recently proposed method obviates the transformation by training detectors with omnidirectional images, which instead incurs huge annotation costs. To obviate both the transformation and annotation works, we leverage an existing large-scale object detection dataset. We train a detector with rotated images and tightly fitted bounding box annotations generated from the segmentation annotations in the dataset, resulting in detecting pedestrians in omnidirectional images with tightly fitted bounding boxes. We also develop pseudo-fisheye distortion augmentation, which further enhances the performance. Extensive analysis shows that our detector successfully fits bounding boxes to pedestrians and demonstrates substantial performance improvement.Comment: Pre-print submitted to Journal of Multimedia Tools and Application

    Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    No full text
    This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method.</p

    Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

    Get PDF
    This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are used individually or jointly, in combination with audio features. Phoneme HMMs modeling the audio and visual features are built based on the multistream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions. These visual features were confirmed to be effective even when the audio HMM was adapted to noise by the MLLR method

    Audio-Visual Speech Recognition Using New Lip Features Extracted from Side-Face Images

    No full text
    This paper proposes new visual features for audio-visual speech recognition using lip information extracted from side-face images. In order to increase the noise-robustness of speech recognition, we have proposed an audio-visual speech recognition method using speaker lip information extracted from side-face images taken by a small camera installed in a mobile device. Our previous method used only movement information of lips, measured by optical-flow analysis, as a visual feature. However, since shape information of lips is also obviously important, this paper attempts to combine lip-shape information with lip-movement information to improve the audio-visual speech recognition performance. A combination of an angle value between upper and lower lips (lip-angle) and its derivative is extracted as lip-shape features. Effectiveness of the lip-angle features has been evaluated under various SNR conditions. The proposed features improved recognition accuracies in all SNR conditions in comparison with audio-only recognition results. The best improvement of 8.0 % in absolute value was obtained at 5dB SNR condition. Combining the lip-angle features with our previous features extracted by the optical-flow analysis yielded further improvement. These visual features were confirmed to be effective even when the audio HMM used in our method was adapted to noise by the MLLR method. 1

    Audio-Visual Speech Recognition Using Lip Movement Extracted from Side-Face Images

    No full text
    This paper proposes an audio-visual speech recognition method using lip movement extracted from side-face images to attempt to increase noise-robustness in mobile environments. Although most previous bimodal speech recognition methods use frontal face (lip) images, these methods are not easy for users since they need to hold a device with a camera in front of their face when talking. Our proposed method capturing lip movement using a small camera installed in a handset is more natural, easy and convenient. This method also effectively avoids a decrease of signal-to-noise ratio (SNR) of input speech. Visual features are extracted by optical-flow analysis and combined with audio features in the framework of HMM-based recognition. Phone HMMs are built by the multi-stream HMM technique. Experiments conducted using Japanese connected digit speech contaminated with white noise in various SNR conditions show effectiveness of the proposed method. Recognition accuracy is improved by using the visual information in all SNR conditions, and the best improvement is approximately 6% at 5dB SNR

    First- and Second-Generation Practical Syntheses of Chroman-4-one Derivative: A Key Intermediate for the Preparation of SERT/5-HT<sub>1A</sub> Dual Inhibitors

    No full text
    Two approaches to large-scale synthesis of the key intermediate <b>9</b>, a precursor of novel dual inhibitors of SERT/5-HT<sub>1A</sub> receptor, are described. These two approaches each feature a mild and efficient method for construction of the chroman-4-one scaffold, which can be used with substrates containing base-sensitive functionalities and enable synthesis on kilogram scale without chromatographic purification. The first-generation synthesis enables quick delivery of a kilogram quantity of the key intermediate <b>9</b> with only one slurry purification step. On the other hand, the highly practical second-generation synthesis is suitable for the multikilogram campaign

    Cardiac sympathetic nervous system imaging with 123I-meta-iodobenzylguanidine: Perspectives from Japan and Europe

    No full text
    Cardiac sympathetic nervous system dysfunction is closely associated with risk of serious cardiac events in patients with heart failure (HF), including HF progression, pump-failure death, and sudden cardiac death by lethal ventricular arrhythmia. For cardiac sympathetic nervous system imaging, 123I-meta-iodobenzylguanidine (123I-MIBG) was approved by the Japanese Ministry of Health, Labour and Welfare in 1992 and has therefore been widely used since in clinical settings. 123I-MIBG was also later approved by the Food and Drug Administration (FDA) in the United States of America (USA) and it was expected to achieve broad acceptance. In Europe, 123I-MIBG is currently used only for clinical research. This review article is based on a joint symposium of the Japanese Society of Nuclear Cardiology (JSNC) and the American Society of Nuclear Cardiology (ASNC), which was held in the annual meeting of JSNC in July 2016. JSNC members and a member of ASNC discussed the standardization of 123I-MIBG parameters, and clinical aspects of 123I-MIBG with a view to further promoting 123I-MIBG imaging in Asia, the USA, Europe, and the rest of the world
    corecore